A hierarchical K-NN classifier for textual data

نویسندگان

  • Rehab Duwairi
  • Rania Al-Zubaidi
چکیده

This paper presents a classifier that is based on a modified version of the well known K-Nearest Neighbors classifier (K-NN). The original K-NN classifier was adjusted to work with category representatives rather than training documents. Each category was represented by one document that was constructed by consulting all of its training documents and then applying feature selection so that only important terms remain. By this, when classifying a new document, it is required to be compared with category representatives and these are usually substantially fewer than training documents. This modified K-NN was experimented with in a hierarchical setting, i.e. when categories are represented as a hierarchy. Also, a new document similarity measure was proposed. It focuses on co-occurring or matching terms between a document and a category when calculating the similarity. This measure produces classification accuracy compared to the one obtained if the cosine, Jaccard or Dice similarity measures were used; yet it requires a much less time. The TrechTC-100 hierarchical dataset was used to evaluate the proposed classifier.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graphical Representation of Textual Data Using Text Categorization System

This paper presents the graphical representation of textual data using text categorization; we had concentrated on the compact representation of the document. Text Categorization has become an important task in data mining (text mining) because of the development of electronic commerce over the internet. All organizations that have business based on internet need an effective categorization met...

متن کامل

An Incremental and Hierarchical K-NN Classifier for Handwritten Characters

This paper analyses the application of hierarchical classifiers based on the k-NN rule to the automatic classification of handwritten characters. The discriminating capacity of a k-NN classifier increases as the size of the reference pattern set (RPS) increases. This supposes a problem for k-NN classifiers in real applications: the high computational cost required when the RPS is large. In orde...

متن کامل

Optimized Seizure Detection Algorithm: A Fast Approach for Onset of Epileptic in EEG Signals Using GT Discriminant Analysis and K-NN Classifier

Background: Epilepsy is a severe disorder of the central nervous system that predisposes the person to recurrent seizures. Fifty million people worldwide suffer from epilepsy; after Alzheimer’s and stroke, it is the third widespread nervous disorder.Objective: In this paper, an algorithm to detect the onset of epileptic seizures based on the analysis of brain electrical signals (EEG) has b...

متن کامل

A comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater

The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...

متن کامل

The design of a nearest-neighbor classifier and its use for Japanese character recognition

The nearest neighbor (NN) approach is a powerfd nonparametric technique for pattern classification tasks. In this paper, algorithms for prototype reduction, hierarchical prototype organization and fast NN search are described. To remove redundant category prototypes and to avoid redundant comparisons, the algorithms exploit geometrical information of a given prototype set which is represented a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2011